Sains Malaysiana 52(7)(2023): 1901-1914
http://doi.org/10.17576/jsm-2023-5207-01
RFE-Based Feature Selection to Improve Classification Accuracy for
Morphometric Analysis of Craniodental Characters of
House Rats
(Pemilihan Ciri Berasaskan RFE untuk Meningkatkan Ketepatan Pengelasan dalam Analisis Morfometri Sifat Kraniodental Tikus Rumah)
ANEESHA
BALACHANDRAN PILLAY1, DHARINI PATHMANATHAN1*,
ARPAH ABU2 & HASMAHZAITI OMAR2
1Institute of Mathematical Sciences, Faculty of
Science, Universiti Malaya, 50603 Kuala Lumpur,
Malaysia
2Institute of Biological Sciences, Faculty of
Science, Universiti Malaya, 50603 Kuala Lumpur,
Malaysia
Diserahkan: 24 Oktober 2022/Diterima: 26 Jun
2023
Abstract
In conventional morphometrics,
researchers often collect and analyze data using large numbers of morphometric
features to study the shape variation among biological organisms. Feature
selection is a fundamental tool in machine learning which is used to remove
irrelevant and redundant features. Recursive feature elimination (RFE) is a
popular feature selection technique that reduces data dimensionality and helps
in selecting the subset of attributes based on predictor importance ranking. In
this study, we perform RFE on the craniodental measurements of the Rattus rattus data to select the best feature subset for both
males and females. We also performed a comparative study based on three machine
learning algorithms such as Naïve Bayes, Random Forest, and Artificial Neural
Network by using all features and the RFE-selected features to classify the R. rattus sample based on the age groups. Artificial
Neural Network has shown to provide the best accuracy among these three
predictive classification models.
Keywords: ANN, machine learning, naïve
Bayes, recursive feature elimination, traditional morphometrics
Abstrak
Dalam morfometri konvensional, para penyelidik sering mengumpul dan menganalisis data dengan menggunakan bilangan ciri yang besar untuk mengkaji variasi bentuk antara organisma biologi. Pemilihan ciri memainkan peranan penting dalam pembelajaran mesin algorithma untuk mengeluarkan ciri-ciri yang tidak relevan dan berlebihan. Penghapusan ciri rekursif (RFE) merupakan kaedah pemilihan ciri terkenal yang boleh mengurangkan dimensi data dan juga boleh membantu memilih subset sifat berdasarkan kedudukan kepentingan peramal. Dalam kajian ini, kita menjalankan RFE pada ukuran kraniodental linear bagi data Rattus rattus untuk memilih subset ciri terbaik bagi kedua-dua tikus jantan dan betina. Kita telah menjalankan kajian perbandingan berdasarkan tiga algoritma pembelajaran mesin seperti Bayes Naif, Hutan Rawak dan Rangkaian Neural Tiruan menggunakan semua ciri dan ciri terpilih secara RFE untuk mengelaskan sampel R. rattus berdasarkan kumpulan umur. Setelah memantau hasil nilai ketepatan yang diperoleh bagi ketiga-tiga modal tersebut, Rangkaian Neural Tiruan telah terbukti memberi ketepatan yang terbaik antara ketiga-tiga model ini.
Kata kunci: ANN; Bayes naif; morfometri tradisi; pembelajaran mesin; penghapusan ciri rekursif
RUJUKAN
Abdelhady, A.A. & Elewa, A.M.T.
2010. Evolution of the upper cretaceous oysters: Traditional morphometrics approach. In Lecture Notes in Earth
Sciences 124: 157-176. Springer Verlag. https://doi.org/10.1007/978-3-540-95853-6_6
Alamoudi, M.O., Abdel-Rahman, E.H. & Hassan, S.S.M. 2021.
Ontogenetic and sexual patterns in the cranial system of the brown rat (Rattus norvegicus Berkenhout, 1769) from Hai’l Region, Kingdom of Saudi Arabia. Saudi Journal of Biological Sciences 28(4): 2466-2475. https://doi.org/10.1016/j.sjbs.2021.01.048
Apao, N.J., Feliscuzo, L.S. Sta. Romana, C.L.C. & Tagaro, J.
2020. Multiclass classification using random forest algorithm to prognosticate
the level of activity of patients with stroke. International Journal of
Scientific & Technology Research 9: 1233-1240.
Balakirev, A.E., Abramov, A.V. & Rozhnov, V.V. 2011. Taxonomic revision of Niviventer (Rodentia, Muridae) from Vietnam: A morphological and molecular
approach. Russian Journal of Theriology 10(1):
1-26. https://doi.org/10.15298/rusjtheriol.10.1.01
Bermejo, J.F., Juan F. Gómez Fernández, Fernando Olivencia Polo, and Adolfo Crespo Márquez. 2019. A review of
the use of artificial neural network models for energy and reliability
prediction. A study of the solar PV, hydraulic and wind energy sources. Applied
Sciences (Switzerland) 9(9): 1844. MDPI AG. https://doi.org/10.3390/app9091844
Brace, C.L. & Hunt, K.D. 1990. A
nonracial craniofacial perspective on human variation: A(Ustralia)
to Z(Uni). American Journal of Physical
Anthropology 82(3): 341-360. https://doi.org/https://doi.org/10.1002/ajpa.1330820310
Breno, M., Leirs, H. & Van Dongen, S. 2011. Traditional and geometric morphometrics for studying skull morphology during growth
in Mastomys natalensis (Rodentia: Muridae). Journal
of Mammalogy 92(6): 1395-1406. https://doi.org/10.1644/10-MAMM-A-331.1
Chaudhary, A., Kolhe,
S. & Kamal, R. 2016. An improved random forest classifier for multi-class
classification. Information Processing in Agriculture 3(4): 215-222. https://doi.org/https://doi.org/10.1016/j.inpa.2016.08.002
Chuanromanee, T.S., Cohen, J.I. & Ryan, G.L. 2019. Morphological
analysis of size and shape (MASS): An integrative software program for
morphometric analyses of leaves. Applications in Plant Sciences 7(9):
e11288. https://doi.org/10.1002/aps3.11288
Darst, B.F., Malecki, K.C. & Engelman, C.D. 2018. Using recursive feature elimination in
random forest to account for correlated variables in high dimensional data. BMC
Genetics 19(1): 65. https://doi.org/10.1186/s12863-018-0633-8
Denisko, D. & Hoffman, M.M. 2018. Classification and
interaction in random forests. Proceedings of the National Academy of
Sciences 115(8): 1690-1692. https://doi.org/10.1073/pnas.1800256115
Esselstyn, J.A., Achmadi, A.S., Handika, H. & Rowe, K.C. 2015. A hog-nosed shrew rat (Rodentia: Muridae) from Sulawesi
Island, Indonesia. Journal of Mammalogy 96(5):
895-907. https://doi.org/10.1093/jmammal/gyv093
Gholamy, A., Kreinovich, V. &
Kosheleva, O. 2018. A pedagogical explanation a pedagogical explanation part of
the computer sciences commons. https://scholarworks.utep.edu/cs_techrephttps://scholarworks.utep.edu/cs_techrep/1209
John, C.R. 2022. Package ‘MLeval’. Machine Learning Model Evaluation.
Kassambara, A. & Mundt, F. 2020.
Extract and visualize the results of multivariate data analyses [R Package Factoextra Version 1.0.7].
Kuhn, M. 2008. Building predictive
models in R using the Caret package. Journal of Statistical Software 28(5). https://doi.org/10.18637/jss.v028.i05
Li,
J., Cheng, K., Wang, S., Morstatter, F., Trevino, R.P.,
Tang, J. & Liu, H. 2017. Feature selection: A data perspective. ACM
Computing Surveys 50(6): 94. https://doi.org/10.1145/3136625
Libois, R., Ramalhinho, G., da Luz
Mathias, M., Santos-Reis, M., Fons, R., Petrucci-Fonseca, F., Oom, M. & Collares-Peirera, M. 1996. First approach on the
skull morphology of the black rat (Rattus rattus) from Terceira and São-Miguel Islands (Azores
Archipelago). Vie et Milieu 46(September): 245-251.
Mas, J.F. & Flores, J.J. 2008.
The application of artificial neural networks to the analysis of remotely
sensed data. International Journal of Remote Sensing. Taylor and Francis
Ltd. https://doi.org/10.1080/01431160701352154
Misra, P. & Yadav, A.S. 2020. Improving the classification
accuracy using recursive feature elimination with cross-validation. International
Journal on Emerging Technologies 11(3): 659-665.
Mohamad Ikbal, Nurul Huda, Dharini Pathmanathan, Subha Bhassu, Khanom Simarani & Hasmahzaiti Omar.
2019. Morphometric analysis of craniodental characters of the house rat, Rattus rattus (Rodentia: Muridae) in Peninsular Malaysia. Sains Malaysiana 48(10): 2103-2111.
https://doi.org/10.17576/jsm-2019-4810-05
Motokawa, M., Lin, L-K. & Lu, K-H. 2004. Geographic variation
in cranial features of the polynesian rat Rattus exulans (Peale, 1848) (Mammalia: Rodentia: muridae). The Raffles Bulletin of Zoology 52(2):
653-663.
Musser, G.G. & Newcomb, C. 1983.
Malaysian Murids and the Giant Rat of Sumatra. Bulletin of the American Museum
Natural History 174: Article 4.
Musser,
G., Lunde, D. & Son, N. 2009. Description of a
new genus and species of rodent (Murinae, Muridae, Rodentia) from the Tower
Karst Region of Northeastern Vietnam. American Museum Novitates 3517(September): 1-41. https://doi.org/10.1206/0003-0082(2006)3517[1:DOANGA]2.0.CO;2
R Core
Team. 2020. R: A language and environment for statistical computing. R Foundation for Statistical Computing,
Vienna, Austria. https://www.R-project.org/.
Sammut, C. &
Webb, G. 2010. Encyclopedia of Machine Learning. Boston: Springer US.
https://doi.org/10.1007/978-0-387-30164-8
Smith, F.H. 1991. Skull shapes and
the map: Craniometric analysis in the dispersion of
modern homo. By W.W. Howells, Vol. 79, Papers of the Peabody Museum of
Archaeology and Ethnology. Cambridge: Harvard University Press. 1989. 187. American Journal of Physical Anthropology 86(1): 89-90. https://doi.org/https://doi.org/10.1002/ajpa.1330860110
Tan, J., Chang, S-W., Abdul Kareem, S.,
Yap, H.J. & Thai, Y-K. 2018. Deep learning for plant species classification
using leaf vein morphometric. IEEE/ACM Transactions on Computational Biology
and Bioinformatics 17(1): 82-90. https://doi.org/10.1109/TCBB.2018.2848653.
Tang, Y., Horikoshi, M. & Li, W.
2016. Ggfortify: Unified interface to visualize
statistical results of popular R packages. The R Journal 8(2): 474. https://doi.org/10.32614/RJ-2016-060
Timm, R.M., Weijola, V., Aplin, K.P., Donnellan, S.C., Flannery, T.F., Thomson, V. &
Pine, R.H. 2016. A new species of Rattus (Rodentia: Muridae) from Manus
Island, Papua New Guinea. Journal of Mammalogy 97(3): 861-878. https://doi.org/10.1093/jmammal/gyw034
Wolfer, A., Ebbels, T. & Cheng, J. 2022. Package ‘SantaR’. Short Asynchronous Time-Series Analysis.
Wu, B. 1992. An introduction to neural networks and their applications
in manufacturing. Journal of Intelligent Manufacturing 3(6): 391-403. https://doi.org/10.1007/BF01473534
*Pengarang untuk surat-menyurat; email: dharini@um.edu.my
|